Learning Evaluation Functions for Large Acyclic Domains

نویسندگان

  • Justin A. Boyan
  • Andrew W. Moore
چکیده

Some of the most successful recent applications of reinforcement learning have used neural networks and the TD( ) algorithm to learn evaluation functions. In this paper, we examine the intuition that TD( ) operates by approximating asynchronous value iteration. We note that on the important subclass of acyclic tasks, value iteration is ine cient compared with another graph algorithm, DAG-SP, which assigns values to states by working strictly backwards from the goal. We then present ROUT, an algorithm analogous to DAG-SP that can be used in large stochastic state spaces requiring function approximation. We close by comparing the behavior of ROUT and TD on a simple example domain and on two domains with much larger state spaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Peer Assessment in evaluation of Medical sciences students

Introduction: Recently, peer assessment is especially noticed as a progress evaluation method. Although it is a known method, it is a novel method in many countries that they use traditional methods. Then the topic of current review article is peer assessment in medical education. Methods: The documents related to peer assessment, advantages, disadvantages, applications and how use it extracte...

متن کامل

Learning Evaluation Functions to Improve Local Search

This paper describes Stage, a learning algorithm that automatically improves search performance on large-scale optimization problems. Stage learns an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is used to bias future search trajectories toward better opt...

متن کامل

Transitive orderings of properties of utility functions

This note considers orderings of properties (or assumptions) on utility functions and specifies domains on which those orderings are transitive or acyclic.

متن کامل

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversaria...

متن کامل

Learning Evaluation Functions to Improve Optimization by Local Search

This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996